TKLBLIIR: Detecting Twitter Paraphrases with TweetingJay
نویسندگان
چکیده
When tweeting on a topic, Twitter users often post messages that convey the same or similar meaning. We describe TweetingJay, a system for detecting paraphrases and semantic similarity of tweets, with which we participated in Task 1 of SemEval 2015. TweetingJay uses a supervised model that combines semantic overlap and word alignment features, previously shown to be effective for detecting semantic textual similarity. TweetingJay reaches 65.9% F1-score and ranked fourth among the 18 participating systems. We additionally provide an analysis of the dataset and point to some peculiarities of the evaluation setup.
منابع مشابه
Extracting Lexically Divergent Paraphrases from Twitter
We present MULTIP (Multi-instance Learning Paraphrase Model), a new model suited to identify paraphrases within the short messages on Twitter. We jointly model paraphrase relations between word and sentence pairs and assume only sentence-level annotations during learning. Using this principled latent variable model alone, we achieve the performance competitive with a state-of-the-art method whi...
متن کاملCDTDS: Predicting Paraphrases in Twitter via Support Vector Regression
In this paper we describe a system that recognizes paraphrases in Twitter for tweets that refer to the same topic. The system participated in Task1 of SEMEVAL-2015 and uses a support vector regression machine to predict the degree of similarity. The similarity is then thresholded to create a binary prediction. The model and experimental results are discussed along with future work that could im...
متن کاملAcquiring Predicate Paraphrases from News Tweets
We present a simple method for evergrowing extraction of predicate paraphrases from news headlines in Twitter. Analysis of the output of ten weeks of collection shows that the accuracy of paraphrases with different support levels is estimated between 60-86%. We also demonstrate that our resource is to a large extent complementary to existing resources, providing many novel paraphrases. Our reso...
متن کاملGathering and Generating Paraphrases from Twitter with Application to Normalization
We present a new and unique paraphrase resource, which contains meaningpreserving transformations between informal user-generated text. Sentential paraphrases are extracted from a comparable corpus of temporally and topically related messages on Twitter which often express semantically identical information through distinct surface forms. We demonstrate the utility of this new resource on the t...
متن کاملTwitter Paraphrase Identification with Simple Overlap Features and SVMs
We present an approach to identifying Twitter paraphrases using simple lexical overlap features. The work is part of ongoing research into the applicability of knowledgelean techniques to paraphrase identification. We utilize features based on overlap of word and character n-grams and train support vector machine (SVM). Our results demonstrate that character and word level overlap features in c...
متن کامل